Topic Modeling based on Keywords and Context

نویسنده

  • Johannes Schneider
چکیده

Current topic models often suffer from discovering topics not matching human intuition, unnatural switching of topics within documents and high computational demands. We address these concerns by proposing a topic model and an inference algorithm based on automatically identifying characteristic keywords for topics. Keywords influence topic-assignments of nearby words. Our algorithm learns (key)word-topic scores and it self-regulates the number of topics. Inference is simple and easily parallelizable. Qualitative analysis yields comparable results to state-of-the-art models (eg. LDA), but with different strengths and weaknesses. Quantitative analysis using 9 datasets shows gains in terms of classification accuracy, PMI score, computational performance and consistency of topic assignments within documents, while most often using less topics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Context-aware Modeling for Spatio-temporal Data Transmitted from a Wireless Body Sensor Network

Context-aware systems must be interoperable and work across different platforms at any time and in any place. Context data collected from wireless body area networks (WBAN) may be heterogeneous and imperfect, which makes their design and implementation difficult. In this research, we introduce a model which takes the dynamic nature of a context-aware system into consideration. This model is con...

متن کامل

Short Text Feature Enrichment Using Link Analysis on Topic-Keyword Graph

In this paper, we propose a novel feature enrichment method for short text classification based on the link analysis on topic-keyword graph. After topic modeling, we re-rank the keywords distribution extracted by biterm topic model (BTM) to make the topics more salient. Then a topic-keyword graph is constructed and link analysis is conducted. For complement, the K-L divergence is integrated wit...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

The Effect of Cognitive Factors of Rhetorically Different Listening Tasks on L2 Listening Quality of Iranian Advanced EFL Learners

This study examined the effect of two different authentic topic-familiar rhetorical L2 listening tasks (expository and argumentative) differing in reasoning demand on the listening comprehension scores of a number of Iranian EFL advanced learners. Sixty homogeneous advanced learners were recruited based on their performance on an English Language Proficiency test (Fowler & Coe, 1976). Then they...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.02650  شماره 

صفحات  -

تاریخ انتشار 2017